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Abstract. Let s q (n) denote the base q sum of digits function, which for n < x, is centered 
, around log, a;. In [3], they look at sum of digits of prime numbers, and provide asymp- 

totics for the size of the set {p < x, p prime s q (p) = a (q — 1) log, a;} where a lies in the 
range 
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for some constant K . In this note, we examine the tails of this distribution, and prove that 
\{p<x,p prime : s q (p) > a (q - 1) log, x} | > e ^(i-a^-cOog*) 1 /^ 

for i < a < 0.7375. This proves that there are infinitely many primes with more than twice 
as many ones than zeros in their binary expansion. 

H: 

1. Introduction 

A prime number which can be written in the form 2™ — 1 will have only ones in its binary 
expansion, and is called a Mersenne prime. The first few such primes are 3, 7, 31, and 127. 
Currently, the largest known prime is of this form, and it has over 12.9 million digits. These 
numbers have been looked at for centuries, and date back to Euclid who was interested in 
them for their connection with perfect numbers, something that we will not explore here. It 
■ is a long standing conjecture that there are infinitely many Mersenne primes, and currently 

this seems entirely out of reach of modern analytic methods. However, we may weaken the 
condition and ask about primes with a large number of l's in their base 2 expansion . With 
this in mind, we ask the following motivational question: 



Problem 1. Are there infinitely many primes with twice as many ones than zeros in their 
^ . binary expansion? 

If we let s q (n) denote the sum of the digits of n written in base q, then we are asking if 
there are infinitely many primes p which satisfy S2{p) > |log 2 p. Moving to a slightly more 
general setting, we will look at the sum of digits base q rather than just the binary case. The 
average of s q (n) is roughly multiplied by the number of digits, so we have the asymptotic 

n<x 

However, things become much more complicated when we restrict ourselves to the prime 
numbers. In 1946 Copeland and Erdos |2] proved that 

where tc(x) = ^2 p<x 1 is the prime counting function, and a more precise error term was 
subsequently given by Shiokawa [1]. In 2009, Drmota, Mauduit and Rivat [3] gave exact 
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asymptotics for the set 



{p < x, p prime s q (p) — a (q — 1) log g xj 

where a lies in the range 
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and is chosen so that a (q — 1) log g x is an integer which avoids certain congruence conditions. 
However, these results don't allow us to make any conclusions about problem [TJ In [3J they 
also asked about finding non-trivial bounds for the sum Y1 P < X 2 Sq<yP \ as this would yields 
results regarding the tail distribution of the sum of digits of primes. That is, lower bounds 
for the size of sets of primes of the form 

{p < x, p prime : s g (n) > a(q — 1) log„ x] 

where a > \. These are exactly the type of bounds we are looking for in order to answer 
our question, as problem [1] is the case when a = | and q = 2 . In this note, we provide such 
lower bounds, and prove the following: 

Theorem 2. Given 0.2625 < {3 < \ and | < a < 0.7375, for sufficiently large x we have 
that 

\{p < x, p prime : s q (n) > a(q — 1) log g x} | 3> e x 2 ( 1 - a ) e - c ( lo s a; ) 

and 

\{p < x, p prime : s q (n) < /3(q — 1) log g a;}| x 2/3 e~ c ^ log:r ^ 

We do not examine the sum Yl p < x ^ Sq ^ p \ rather we note that the multinomial distribution 
is sharply peaked, so results regarding primes in small intervals allow us to attain such a 
lower bound. From theorem [2J problem [T] follows as a corollary. In fact, we have that for 
any a < 0.7375 there are infinitely many primes where the proportion of l's in their binary 
expansion greater than a. 

2. The Tail Distribution 
We start by providing bounds on the size of the tails of the multinomial distribution. 
Lemma 3. (Chernoff bound) Given | < a < 1, we have that 



\{n < q k : a (q — 1) k < s q (n)} | < exp I — — 




Proof. On the interval [0, g fc ] each digit can be thought of as an independent random variable 
which corresponds to the roll of a q sided dice with sides 0, 1, . . . , q — 1. Normalizing, let £ 
be a random variable where 

V Q~ 1 / Q 
for < j < q — 1, and for each i let & = £. Our goal is then to examine 
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For any nonnegative t, 



p(7 ^i + 6 + --- + eA < E(e^+-^) 
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= (e-*TE(e*)) 

= e -W(*,7) 

where 

7(t,7) =t7-logE(e«) . 
Evaluating the expectation, we find that 

9-1 1 /, x *>-t 9-1 / . \ 7 1 sinh(t+- L r 

^ * fe V ^ 9 sinh(^) 

This gives rise to the series expansion 

/ i dnh( t + ^) \ ii± ^ _ £+£±1+1,. + ((6) 
S ^ sinh J % - 1) 180(, - 1)3 " ' 

allowing us to prove that 

To maximize 7), we choose t = ^i— ; and obtain the upper bound 
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which proves the lemma since q > 2. □ 

Next, we will need the best existing results on prime gaps. In 2001, Baker, Harman and 
Pintz proved that 

(2.1) TV (x + X 9 ) - TT(x) > 



logx 

for any 9 > 0.525 [lj. Armed with equation 12.11 and lemma El we are now ready to prove 
theorem [2J 

Proof. Let a = a + r(x) where r{x) is chosen so that a < 0.7375. Let k = [loggxl, so 
that q k < x, and let I = [2 (l — a ) k~] . Consider the interval [q k — q l , q k — l] , which is an 
interval whose first k — I digits base q are equal to q — 1. By Baker, Harman and Pintz, there 
will be 
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primes in this interval, where the constant is explicit.. By Lemma [3j there are at most 
exp ^— TjjJ integers between and which have digit sum less than (q — 1)1 (| — 5) . Letting 

5 = it follows that there are at most q l e~^ OBl ^ integers in the interval [q k — q l , q k — l] 
whose digit sum is less than 

(q-l)(k-l) + (q-l)l Q"^ 



As q l e ( log ') 2 is significantly smaller than k ^ ogq , almost all of the primes in this interval will 
have a digit sum greater than the above, and so we see that there 
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primes with digit sum larger than 

a (q - l)k\og q (x) - (q - l)Vl\ogl. 

Expanding a = a + r(x), and taking r(x) = c^=r for the appropriate constant c yields a 
digit sum greater than 

a(q- l)log g (x), 

which proves the result since 

J, y.2(\-a) x -2r(x) 



3> x 2 ^ 1 °^ exp ^— CA/log x log log X^j . 



log (x) log X 

The proof for the lower bound of the size of the corresponding set of primes with s q (p) < 
P(q - 1) hg q (x) for 0.2625 < (3 < \ is identical. □ 

Remark. The reader may note that for any a < 0.7375 there are more possible choices for 
the first k — I digits other than all l's. It is conceivable that if we looked at multiple intervals 
where the first k — / digits had many l's that we would be able to increase the density by a 
small factor, and possibly a significant factor for smaller a. While such an approach seems 
promising, and while it seems logical to sum over multiple intervals, the end result and lower 
bound for the number of primes is roughly the same. The exponent of x is no different, so 
we opted to present the simpler argument above. 
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